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APPORTIONING A WORK UNIT TO 

EXECUTE IN PARALLEL IN A 
HETEROGENEOUS ENVIRONMENT 

PROVISIONAL APPLICATION 

This application claims the benefit of U.S. Provisional 
Application No. 60/064,753, entitled "METHOD FOR 
APPORTIONING AWORK UNIT TO EXECUTE IN PAR- 
ALLEL IN A HETEROGENEOUS ENVIRONMENT," 
filed on Nov. 7, 1997, by Ted E. Blank, et al., which is 
incorporated by reference herein. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates in general to computer- 
implemented processing systems, and, in particular, to a 
technique for apportioning a work unit to execute in parallel 
in a heterogeneous environment. 

2. Description of Related Art 

A relatively recent innovation in computer systems has 
been to distribute processing of units of work (e.g., a unit of 
work may be a computer task that selects data from a 
database) to multiple computer systems connected by a 
network. Each distributed computer system has one or more 
processors for executing units of work. A distributed com- 
puter system has many advantages over a centralized 
scheme with regard to flexibility and cost effectiveness. 

When a computer system consists of multiple processors 
(i.e., central processors or "CPs"), each processor may have 
a different processing power. A processor is part of a 
computer system that executes instructions to complete a 
unit of work. Processor power indicates how quickly a 
processor is able to execute a unit of work. Typically, the 
processing power is represented with reference to how many 
millions of instructions per second (MIPS) that a processor 
executes. 

A multitude of configurations are possible for a multi- 
processor computer system. For example, a user might wish 
to run a query (i.e., which is an example of a unit of work) 
in a computer system that consists of a 2-way system (i.e., 
a computer system that includes 2 processors) and an 8-way 
system (i.e., a computer system that includes 8 processors). 

In a distributed environment, work generated at one 
computer system on the network may be divided and dis- 
tributed to other computer systems on the network for 
processing. In many situations, the workload is not distrib- 
uted efficiently. For example, a slower processor may be 
given the same amount of work as a faster processor. If this 
were to happen, the faster processor would complete its 
processing and wait for the slower processor to complete its 
processing. It is a waste of resources to let the faster 
processor wait idly. Thus, to efficiently utilize the 
processors, it is desirable to make optimal assignments of 
work to each available processor. 

Some conventional computer systems that distribute work 
across processors require a homogeneous configuration. In a 
homogeneous configuration, each processor has the same 
processing power. In a homogeneous configuration, work is 
simply apportioned evenly across each processor (i.e., each 
processor is given the same amount of work). However, the 
homogeneous configuration is not always the way users 
grow their computer systems. 

One implication of the ability of networked computer 
systems to grow or shrink with time is that processors within 
each computer system may be entirely different. Some 
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processors may be purchased at later times and have advan- 
tages due to improved technology, and some processors may 
have more processing power than others. Additionally, the 
networked computer systems may originally contain pro- 

5 cessors optimized for different purposes, from desktop com- 
puters to massively parallel processors (MPP's). 

In a heterogeneous environment, each processor may 
have a different processing power. Therefore, apportioning 
work in a heterogeneous environment is challenging. Some 

10 conventional computer systems divide a work unit into even 
portions and distribute the same amount of work to each 
processor, regardless of the processing power of each pro- 
cessor. In this case, it is likely that slower processors will 
create a bottleneck that can affect the total elapsed time of 

15 processing the work unit. Additionally, dividing a work unit 
up evenly may prevent the full use of a faster processor's 
capabilities since the slowest processor in the configuration 
will always be the last to finish. 

One aspect of the flexibility of networked computer 

20 systems is that their configuration can be changed easily. For 
example, a processor in a computer system assigned to the 
accounting department might be available to every computer 
system belonging to the network of computer systems most 
of the time; but, when the accounting department has a peak 

25 load, that processor may be made unavailable to the network 
of computer systems for a day to dedicate it to the work of 
the accounting department. Additionally, due to fluctuations 
in the workload, or due to imperfections in the work 
allocation technique, some processors may develop a back- 

30 log of work that has been assigned to them, while other 
processors are idle. 

There is a need for improved allocation technique to 
provide efficient allocation of work to processors. 

35 SUMMARY OF THE INVENTION 

To overcome the limitations in the prior art described 
above, and to overcome other limitations that will become 
apparent upon reading and understanding the present 
specification, the present invention discloses a method, 
apparatus, and article of manufacture for a computer- 
implemented apportioning system. 

In accordance with the present invention, work is distrib- 
uted to processors in a multi-processor computer system. 

45 Initially, during bind-time, a scaling factor is determined for 
each processor. The scaling factor represents relative pro- 
cessing power in relation to each other processor. Then, 
portions of a total amount of work are distributed to each 
processor based on the determined scaling factor of that 

50 processor and a determined amount of work for an average 
processor. 

An object of the invention is to provide an improved 
technique for distributing work across processors in one 
computer system. Another object of the invention is to 

55 provide a technique for distributing work across computer 
systems having one or more processors and connected by a 
network. Yet another object of the invention is to provide a 
technique for distributing work across processors so that 
each of the processors completes processing at approxi- 

60 mately the same time. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Referring now to the drawings in which like reference 
numbers represent corresponding parts throughout: 
65 FIG. 1 illustrates an exemplary computer hardware envi- 
ronment that could be used in accordance with the present 
invention; 
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FIG. 2 illustrates an exemplary network environment ing statistics, handling startup and shutdown, and providing 

connecting multiple computer systems used to implement management support. 

the preferred embodiment of the invention; At the center of the DB2® system is the Database 

FIG. 3 is a flow diagram illustrating the steps performed Services moM * * 4 ™f module 114 

by the apportioning system during a bind-time phase; 5 contauis several submoduks, including the Relahonal Data- 

J . „ . .„ • . / j base System (RDS) 116, the Data Manager 118, the Buffer 

FIG. 4 is a flow diagram illustrating the steps performed Manager 120 (which manages the buffer space (i.e., memory 

by the apportioning system during a run-time phase; reserved for use by a processor to execute a unit of work)), 

FIGS. 5A-5C illustrate an example of the apportioning the Apportioning System 124, and other components 122 

system performing a CPU bound query during a bind-time such as an SQL compiler/interpreter. These submodules 

phase; 1 support the functions of the SQL language, i.e. definition, 

FIGS. 6A-6C illustrate an example of the apportioning access control, interpretation, compilation, database 

system performing a CPU bound query during a run-time retrieval, and update of user and system data. The Appor- 

phase in which an additional processor has been added since tionin g Svstem 124 works in conjunction with the other 

the bind-time phase* submodules to apportion work across multiple processors. 

™™ • * 2!^-„' c u ■ 15 SQL statements are interpreted and executed in the DB2 

FIGS. 7A-7C illustrate an example of the apportioning m ^ SQL statements m m t t0 a ^-compiler, 

system performing a CPU bound query during a run-time are tWQ outputs from the pre ^ ompiler: a mod ified 

phase with a limit on the buffer space; and saaax module and a Database R equ est Module (DBRM). 

FIGS. 8A-8C illustrate an example of the apportioning The modified source module contains host language calls to 
system performing an I/O bound query during a bind-time 20 DB2, which the pre-compiler inserts in place of SQL state- 
phase, ments. The DBRM consists of the SQL statements. A 

compile and link-edit component uses the modified source 

DETAILED DESCRIPTION OF THE module to produce a load module, while an optimize and 

PREFERRED EMBODIMENT bifld component ^ s tne DBRM to produce a compiled set 

In the following description of the preferred embodiment, 25 of runtime structures for the application plan. The SQL 

reference is made to the accompanying drawings which statements specify only the data that the user wants, but not 

form a part hereof, and which is shown by way of illustration how to get to it. The optimize and bind component may 

a specific embodiment in which the invention may be reorder the SQL query. Thereafter, the optimize and bind 

practiced. It is to be understood that other embodiments may component considers both the available access paths 

be utilized as structural changes may be made without 30 (indexes, sequential reads, etc) and system held statistics on 

departing from the scope of the present invention. the data to be accessed (the size of the table, the number of 

Hardware Environment distinct values in a particular column, etc.), to choose what 

FIG. 1 illustrates an exemplary computer hardware envi- it considers to be the most efficient access path for the query, 

ronment that could be used in accordance with the present The load module and application plan are then executed 

invention. In the exemplary environment, a computer system 35 together. 

102 is comprised of one or more processors connected to one Generally, the Apportioning System 124 and the instruc- 

or more data storage devices 104 and 106 that store one or tions derived therefrom, are all tangibly embodied in a 

more relational databases, such as a fixed or hard disk drive, computer-readable medium, e.g. one or more of the data 

a floppy disk drive, a CDROM drive, a tape drive, or other storage devices 104 and 106. Moreover, the Apportioning 

device. 40 System 124 and the instructions derived therefrom, are all 

Operators of the computer system 102 use a standard comprised of instructions which, when read and executed by 

operator interface 108, such as IMS/DB/DC®, CICS®, the computer system 102, causes the computer system 102 

TSO®, OS/390®, ODBC® or other similar interface, to to perform the steps necessary to implement and/or use the 

transmit electrical signals to and from the computer system present invention. Under control of an operating system, the 

102 that represent commands for performing various search 45 Apportioning System 124 and the instructions derived 

and retrieval functions, termed queries, against the data- therefrom, may be loaded from the data storage devices 104 

bases. In the present invention, these queries conform to the and 106 into a memory of the computer system 102 for use 

Structured Query Language (SQL) standard, and invoke during actual operations, 

functions performed by Relational DataBase Management Thus, the present invention may be implemented as a 

System (RDBMS) software. 50 method, apparatus, or article of manufacture using standard 

In the preferred embodiment of the present invention, the programming and/or engineering techniques to produce 
present invention comprises the DB2® product offered by software, firmware, hardware, or any combination thereof. 
IBM for the OS/390® operating system. Those skilled in the The term "article of manufacture" (or alternatively, "corn- 
art will recognize, however, that the present invention has puter program product") as used herein is intended to 
application program to any software and is not limited to the 55 encompass a computer program accessible from any 
DB2® product. computer-readable device, carrier, or media. Of course, 

As illustrated in FIG. 1, the DB2® system for the those skilled in the art will recognize many modifications 

OS/390® operating system includes three major compo- may be made to this configuration without departing from 

nents: the Internal Resource Lock Manager (IRLM) 110, the the scope of the present invention. 

Systems Services module 112, and the Database Services 60 Those skilled in the art will recognize that the exemplary 

module 114. The IRLM 110 handles locking services for the environment illustrated in FIG. 1 is not intended to limit the 

DB2® system, which treats data as a shared resource, present invention. Indeed, those skilled in the art will 

thereby allowing any number of users to access the same recognize that other alternative hardware environments may 

data simultaneously. Thus concurrency control is required to be used without departing from the scope of the present 

isolate users and to maintain data integrity. The Systems 65 invention. 

Services module 112 controls the overall DB2® execution FIG. 2 illustrates an exemplary network environment 

environment, including managing log data sets 106, gather- connecting multiple computer systems used to implement 
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the preferred embodiment of the invention. Multiple, sepa- 
rate DB2 systems, 200, 202, 204, and 206, each having a 
processor and an apportioning system, are connected by a 
network 208. Each of the DB2 systems is illustrated with 
one processor; however, each of the computer systems could 5 
have multiple processors. Additionally, each of the computer 
systems could include other components, such as connected 
data storage devices. The apportioning system 124 is able to 
apportion work to processors that are in separate computer 
systems connected by a network, as will be described in 10 
further detail below. 

Apportioning a Work Unit to Execute in Parallel in a 
Heterogeneous Environment 

The apportioning system 124 of the present invention 
apportions a work unit across multiple processors to give is 
faster processors more work than slower processors so that 
all the processors complete in approximately the same 
amount of time. The apportioning system 124 achieves this 
by taking into account system configuration factors hat have 
a substantial impact on the optimization of an allocation. 20 

In particular, the apportioning system 124 determines a 
scaling factor for each processor based on the average speed 
of all the processors. The scaling factor represents relative 
processing power in relation to each other processor. A faster 
than average processor will have a scaling factor greater 25 
than one, while a slower than average processor will have a 
scaling factor less than one. 

Then, the apportioning system 124 divides the work unit 
into partitions based on the average processor speed to 
obtain the amount of work performed by an average pro- 30 
cessor (i.e., average amount of work per processor). This 
amount of work per average processor is multiplied by the 
processor's scaling factor to determine the actual amount of 
work for each processor. In this way, a faster than average 
processor gets more work than a slower than average 35 
processor. 

In a bind-time phase, the apportioning system 124 makes 
an initial determination of how much work to apportion to 
each processor for each system. In a DB2® system, "bind- 
time" refers to the time during which the optimize and bind 40 
performs determines the most efficient access path for the 
query, while "run-time" refers to the time during which the 
SQL statements submitted to the DB2® system are 
executed. 

The resources (e.g., the number of processors or buffer 45 
space available during bind-time) may change by run-time. 
In a run-time phase, the apportioning system 124 determines 
whether the number of available processors has changed 
from the bind-time phase and whether there are adequate 
other resources (e.g., available memory, in addition to buffer 50 
space, to accommodate the desired number of parallel tasks) 
to process the work. If any of these conditions exist, the 
apportioning system 124 re-evaluates the amount of work to 
apportion to each processor in each system to determine an 
optimum workload balance for the current run-time envi- 55 
ronment. 

The following example is provided to illustrate the use of 
the apportioning system 124 in embodiment of the inven- 
tion. Initially, the apportioning system 124 receives a query 
(e.g., from a user or an application). The query is a request 60 
for data from a database. 

Databases are computerized information storage and 
retrieval systems. A Relational Database Management Sys- 
tem (RDBMS) is a database management system (DBMS) 
which uses relational techniques for storing and retrieving 65 
data. Relational databases are organized into tables which 
consist of rows and columns of data. The rows are formally 
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called tuples or records. A database will typically have many 
tables and each table will typically have multiple tuples and 
multiple columns. The tables are typically stored on direct 
access storage devices (DASD), such as magnetic or optical 
disk drives for semi-permanent storage. 

A table can be divided into partitions, with each partition 
containing a portion of the table's data. By partitioning 
tables, the speed and efficiency of data access can be 
improved. For example, partitions containing more fre- 
quently used data can be placed on faster data storage 
devices, and parallel processing of data can be improved by 
spreading partitions over different DASD volumes, with 
each I/O stream on a separate I/O path. Partitioning also 
promotes high data availability, enabling application and 
utility activities to progress in parallel on different partitions 
of data. 

Data may be distributed among partitions by a variety of 
schemes ("partitioning schemes"). One partitioning scheme 
assigns data to partitions according to a boundary value 
present in specified columns of the data row. The boundary 
value is the data value that separates each partition from the 
next partition. In one database system, the DB2® product 
offered by International Business Machines Corporation, 
Armonk, N.Y., a range of values is associated with each 
table partition by means of a CREATE INDEX statement. 
The CREATE INDEX statement gives the boundary value 
for each partition. 

To perform the query, the apportioning system 124 iden- 
tifies the partitions of tables referenced by the query that 
may be executed in parallel on multiple processors. The 
apportioning system 124 determines how many of these 
partitions should be apportioned to each available processor 
to provide efficient use of each processor, When the appor- 
tioning system 124 is used in conjunction with work that is 
not already partitioned (e.g., partitions of a table), the 
apportioning system 124 partitions the work. 

During a bind-time phase, the apportioning system 124 
determines the number of partitions to be apportioned to 
each available processor based on the configuration of the 
system at bind-time. The apportioning system 124 looks at 
various elements of configuration, including the number of 
processors available and the amount of memory available 
for the processors to execute tasks. 

The apportioning system 124 determines a processor 
power value that represents the capability of each of the 
processors that may be apportioned work. Different mea- 
sures may be used to determine the processing power value. 
For example, a rating for the number of millions of instruc- 
tions per second (MIPS) the processor is capable of execut- 
ing may be used as a processor power value. 

The apportioning system 124 determines the total capa- 
bility ("total processing power") of a system by adding the 
processor power values of the processors in the system. 
When processing power is represented by MIPS, the appor- 
tioning system 124 sums the MIPS of each processor. 

The apportioning system 124 determines the average 
capability ("average processing power") of the processors in 
the system by dividing the total capability by the number of 
processors selected. 

The apportioning system 124 then determines the number 
of partitions that would be allocated to an average processor. 
This number would be the optimal number to assign to each 
processor if the computing system were a homogeneous 
symmetrical multiprocessor (SMP) system. 

In the preferred embodiment, the number of partitions to 
apportion to each processor is adjusted based on the capa- 
bility of each processor. In particular, the apportioning 
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system 124 scales the number of partitions to be allocated to avgWork«proc#*bound Val 
an average processor by the ratio of the capability associated 

with that processor to the average capability of the selected 4. The apportioning system 124 determines the amount of 

processors. work for each actual processor, which is represented by 

During a run-time phase, the apportioning system 124 5 "procWork", equal to the processor's scale factor, which is 

re-evaluates the apportionment of partitions to each proces- represented by "procScaleFactor", times the amount of work 

sor based on the configuration of the computer system at for the average processor, which is represented by "avg- 

run-time. For example, if after the bind-time phase, the Work", 
apportioning system 124 determines that the number of 

processors or the amount of available memory has changed, 10 procWork=procScaleFactor*avgWork 
the apportioning system 124 re-determines the apportion- 
ment of partitions to processors. The apportioning system FIG. 3 is a flow diagram illustrating the steps performed 
124 then apportions partitions to each processor according to °y the apportioning system 124 during a bind-time phase. In 
the number of partitions it has determined for that processor. Block 300 > the apportioning system 124 determines an 
In an alternative embodiment, the apportioning system 15 avera g e processing power for the processors based on the 
124 may determine a number of partitions to be apportioned MIPS of each Pressor. In Block 302, the apportioning 
to a cluster of processors. If the processors within the cluster s 3* ten ? 12 u 4 dete ™^ s a *cabng factor for each processor, 

have identical capability, the apportioning system 124 mul- wherem * e ? P ^V?V C processmg 

4 . »f / • i * u * L power in the form of MIPS. In Block 304, the apportioning 

tiplies the number of processors in the cluster by the r A . ,. 4 t . „ . * r » . « 

r , ... - - * . . . * 4 20 system 124 distributes the total amount of work to be 

capability of any of the processors within the cluster to rformed t0 each r based on X!l]iQg factor and 

identify an average capability and uses the average capabil- , he of wofk ^ can ^ handle(J b aQ e 

Uy in determining apportionment, otherwise, the apportion- processor 

ing system 124 sums the scale factors for each processor Run-Time Phase 

within the cluster and uses the summed scale factor to 25 If the resource s change from bind-time to run-time, the 

apportion work. apportioning system 124 re-determines the distribution of 

In another alternative embodiment, the apportioning sys- work to processors. The change in resources could be any 

tem can determine that one or more processors are orders of type of change, including an increase, decrease, or addition, 

magnitude faster than one or more other processors. Then, and could be any type of resource, such as storage space. For 

the apportioning system 124 could send all of the work to be 30 example, the resource change could be an increase in the 

performed to the faster processors, without sending work to number of available processors or could be a limit on the 

the slower processors. buffer space needed by the processors to accommodate 

Bind-Time Phase parallel tasks. Therefore, the apportioning system 

The following elements are performed by the apportion- re-executes the bind-time phase with the changed resources 

ing system 124 in one embodiment of the invention. 35 ( e -g-> morc available processors or a new limit on the 

1. First, the apportioning system 124 determines the maximum number of tasks which can be accommodated due 
average MIPS, represented by "avgMIPS". The average t0 tne Umit on buffer space). For example, for each query 
MIPS is the average capability for all processors and is equal received > th f Waning system 124, at the beginning of 
to the sum of each processor's MIPS, which is represented n l**™™* whether resources have changed. In 

. « , ^™o,x„ , , . 40 particular, the apportioning system 124 checks the conngu- 

by sum (procMIPS) , divided by the number of processors, radoQ of a computef sys(em to determine whether mere has 

which is represented by "proc#". been a change ^ the number of processors since bind-time. 

Additionally, the apportioning system 124 checks for the 

avgMIPS=sum(procMIPS)/proc# ^ 0 f the bu ff er space t0 determine whether the buffer 

45 space can accommodate the number of tasks to be executed 

2. Then, the apportioning system 124 determines the m parallel. 

scaling factor, which is represented by "procScaleFactor", FIGS. 5A-5C, 6A-6C, 7A-7C and 8A-^8C below illus- 

for each processor equal to the processor's MIPS, which is trate examples of the use of the apportioning system 124. 
represented by "procMIPS", divided by the average MIPS 

for all processors, which is represented by "avgMIPS". 50 EXAMPLES 

procScaleFactoroprocMIPS/avgMIPS FIGS - 5A " 5C Mustrate an example of the apportioning 

system 124 performing a CPU bound query during a bind- 

3. Next, the apportioning system 124 determines the time P hase * In this example, a database table that is refer- 

amount of work for an average processor, which is repre- 55 CDC ^ ? i*** is di ? ded ^ i°° P^ 005 < e *S- data f * 

4 , (£ «; 1 » u j . cun . or files). To perform the query, the apportioning system 124 

sented as avgwork based on the homogenous SMP envi- .„ ' . *. A . ^ ,* ' 4 rr to / - 

r 0 _ ° . will submit instructions to different processors to perform 

ronment of average processors. The work for an average ^ query OQ partitions of the database taWe 

processor is the equal to the number of processors, "proc#", _ e . , . _„ , _ , _ _ _ 

- . , Information table 500 provides the names of the DB2 

of a computer system times a value, which is represented by systems 5ft2 ^ ^ execute ^ ^ Qumber Qf prQ _ 

"boundVal", that is based on whether work is CPU bound or cessofS m q{ each Dfi2 system 502 ^ MIps 5Q6 

I/O bound. In particular, the boundVal parameter equals the executed by each processor 504 of each DB2 system 502, 

number ofpartitions divided by the degree of parallelism. If and the total MIPS 508 for each DB2 system 502. For 

the work is CPU bound, the degree of parallelism will example, DB2 System A has eight processors, each execut- 

approach the number of CPU processes available. If the 65 ing 200 MIPS, with a total processing power of 1600 MIPS. 

work is I/O bound, the degree of parallelism will approach Using the total number of processors 510 (i.e., 20) for all 

the number of tasks desired. of the computer systems and the total MIPS 512 (i.e., 2800) 



04/20/2004, EAST version: 1.4.1 



US 6,496, 

9 

for all of the DB2 systems, the apportioning system 124 
determines that the average MIPS is as follows: 

avgMIPS=2800/20=140 MIPS 

5 

Next, the apportioning system 124 determines scaling 
factors for each DB2 system. Information Table 520 iden- 
tifies the DB2 systems 522 and their scaling factors 524. For 
example, DB2 System A has a scaling factor of 1.428 (i.e., 
200 MIPS/140 MIPS)), which reflects the higher processing 10 
power of DB2 System A relative to DB2 System B with a 
scaling factor of 0.714 and DB2 System C with a scaling 
factor of 0.714. 

Information Table 530 identifies the DB2 systems 532, 
along with the determination of the amount of work for an is 
average processor 534. In a homogenous SMP environment 
in which the work is CPU bound, the apportioning system 
124 determines the amount of work for an average processor 
based on the number of processors and a bound value that 
reflects that the work is CPU bound. The apportioning 20 
system 124 determines the bound value by dividing the 
number of partitions (i.e., 100) by the number of processors 
(i.e., 20). Thus, the apportioning system 124 determines that 
there are to be 5 partitions/processor, which is the bound 
value. The amount of work for an average processor in a 25 
DB2 system 534 is equivalent to the number of processors 
of the computer system times the bound value. For example, 
DB2 System A has 8 processors and the bound value is 5, so 
40 is an indicator of the amount of work of an average 
processor in DB2 System A. 30 

Once the amount of work for an average processor is 
determined, the apportioning system 124 determines the 
amount of work to be actually distributed to each DB2 
system 536. The amount to be distributed to each DB2 
system is the scaling factor for that computer system times 35 
the amount of work for an average processor. For example, 
for DB2 System A, the amount of work to be distributed to 
processors in DB2 System A is equal to the scaling factor of 
1.428 times the amount of work an average processor in the 
computer system could perform, which is 40. Additionally, 40 
each processor in the DB2 system gets an amount of work 
equal to the amount of work distributed to the DB2 system 
divided by the number of processors for that DB2 system. 
For example, for DB2 System A, each processor gets 7.14 of 
the work (i.e., 57.13/8). The desired tasks 538 are the 45 
number of tasks to be distributed to each DB2 system. The 
number of desired tasks for a DB2 system is equal to (the 
degree of parallelism) times (the number of processors on 
the DB2 system divided by the total number of processors). 
For example in FIGS. 5A-5C, for DB2 system A, the 50 
number of desired tasks (8) is equal to (20) time (8/20). 

Information Table 530 of the example shows that the DB2 
system with the faster processors get more work than the 
DB2 systems with slower processors. This allows the work 
to be divided up so that each processor will flush in the same 55 
amount of time and prevents the slower processors from 
becoming a bottleneck. 

FIGS. 6A-6C illustrate an example of the apportioning 
124 performing a CPU bound query during a run-time phase 
in which an additional processor has been added since the 60 
bind-time phase. Suppose another DB2 System, such as 
DB2 System D, is brought up after bind-time. This would 
mean more processors could be used to apportion work, and 
so the apportioning 124 re-determines the scaling factors for 
all of the DB2 systems. Since the total number of processors 65 
is 25, instead of 20, this would lead to each processor 
working on 4 partitions in a homogeneous SMP environment 
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in which work is CPU bound (i.e., 100 partitions/25 
processors =4 partitions/processor). 

Information Table 600 provides the names of the four 
DB2 systems 602 that could execute work, the number of 
processors 604 of each DB2 system 602, the MIPS 606 
executed by each processor 604 of each DB2 system 602, 
and the total MIPS 608 for each DB2 system 602. 

Using the total number of processors 610 for all of the 
systems and the total MIPS 612 for all of the DB2 systems, 
the apportioning system 124 determines that the average 
MIPS is as follows: 

AvgMIPS-3050/25-122 MIPS 

Next, the apportioning system 124 determines scaling 
factors for each DB2 system. Information Table 620 iden- 
tifies the DB2 systems 622 and their scaling factors 624. 

Information Table 630 identifies the DB2 systems 632, 
along with the determination of the amount of work for an 
average processor 634. In a homogenous SMP environment 
in which the work is CPU bound, the apportioning system 
124 determines the amount of work for an average processor 
based on the number of processors and a bound value that 
reflects that the work is CPU bound. The apportioning 
system 124 determines the bound value by dividing the 
number of partitions (i.e., 100) by the number of processors. 
Thus, the apportioning system 124 determines that there are 
to be 4 partitions/processor, which is the bound value. The 
amount of work for an average processor 636 is the number 
of processors of the computer system times the bound value. 

Once the amount of work for an average processor is 
determined, the apportioning system 124 determines the 
amount of work to be distributed to each DB2 system 636. 
The amount to be distributed to each DB2 system is the 
scaling factor for that computer system times the amount of 
work for an average processor. Additionally, each processor 
in the DB2 system gets an amount of work equal to the 
amount of work distributed to the DB2 system divided by 
the number of processors for that DB2 system. The desired 
tasks are 638 the number of tasks to be distributed to each 
DB2 system. 

Information Table 630 of the example shows that even 
when a DB2 System with slower processors is added 
between bind-time and run-time, the amount of work is 
redistributed appropriately based on a relative processor 
speed of all available processors. 

FIGS. 7A-7C illustrate illustrates an example of the 
apportioning system 124 performing a CPU bound query 
during a run-time phase with a limit on the buffer space. 
Suppose there is no more buffer space on DB2 System C. 
When buffer space is not available, work cannot be pro- 
cessed on that computer system, so there is no reason to use 
that computer system's processors for apportioning work. 
Therefore, none of the processors of System C are used, and 
the scaling factors for the remaining DB2 systems must be 
re -determined. Since the number of allowed tasks is 16 in 
this CPU bound environment, this would lead to each task 
working on 6.25 partitions (i.e., (100 partitions/16 tasks= 
6.25 partitions/task)) in a homogenous SMP environment. 

Information Table 700 provides the names of the DB2 
systems 702 that could execute work, the number of pro- 
cessors 704 of each DB2 system 702, the MIPS 706 
executed by each processor 704 of each DB2 system 702, 
and the total MIPS 708 for each DB2 system 702. 

Using the total number of processors 710 for all of the 
computer systems and the total MIPS 712 for all of the DB2 
systems, the apportioning system 124 determines that the 
average MIPS is as follows: 
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AvgMlPS-2400/16-150 MIPS 

Next, the apportioning system 124 determines scaling 
factors for each DB2 system. Information Table 720 iden- 
tifies the DB2 systems 722 and their scaling factors 724. 

Information Table 730 identifies the DB2 systems 732, 
along with the determination of the amount of work for an 
average processor 734. In a homogenous SMP environment 
in which the work is CPU bound, the apportioning system 
124 determines the amount of work for an average processor 
based on the number of processors and a bound value that 
reflects that the work is CPU bound. The apportioning 
system 124 determines the bound value by dividing the 
number of partitions (i.e., 100) by the number of processors. 
Thus, the apportioning system 124 determines that there are 
to be 6.25 partitions/processor, which is the bound value. 
The amount of work for an average processor is the number 
of processors of a computer system times the bound value. 

Once the amount of work for an average processor is 
determined, the apportioning system 124 determines the 
amount of work to be distributed to each DB2 system 736. 
The amount to be distributed to each DB2 system is the 
scaling factor for that computer system times the number of 
processors in the computer system. Additionally, each pro- 
cessor in the DB2 system gets an amount of work equal to 
the amount of work distributed to the DB2 system divided 
by the number of processors for that DB2 system. The 
desired tasks are 738 the number of tasks to be distributed 
to each DB2 system. 

Information Table 730 of the example shows that even 
when a DB2 System has buffer space shortage at run-time, 
the number of tasks is reduced. 

Information Table 700 of the example shows that when 
the allowed number of tasks falls below a DB2 system's 
processor numbers the scaling factors must be redetermined. 
It also shows that the amount of work for the average 
processor is divided equally between both DB2 systems 
since they have the same number of processors. However, 
when the scaling factors for each DB2 system are used, the 
amount of work for the actual processors causes the DB2 
system with the faster processors to get more work. 

FIGS. 8A-8C illustrate an example of the apportioning 
system performing an I/O bound query during a bind-time 
phase. In this example, the database table that is referenced 
in a query is divided into 100 partitions (e.g., datasets or 
files). To perform the query, the apportioning system 124 
will submit instructions to different processors to perform 
the query on different partitions of the database table 

Information table 800 provides the names of the DB2 
systems 802 that could execute work, the number of pro- 
cessors 804 of each DB2 system 802, the MIPS 806 
executed by each processor 804 of each DB2 system 802, 
and the total MIPS 808 for each DB2 system 802. For 
example, DB2 System A has eight processors, each execut- 
ing 200 MIPS, with a total processing power of 1600 MIPS. 

Using the total number of processors 810 (i.e., 20) for all 
of the systems and the total MIPS 812 (i.e., 2800) for all of 
the DB2 systems, the apportioning system 124 determines 
that the average MIPS is as follows: 

avgMIPS=2800/20«140 MIPS 

Next, the apportioning system 124 determines scaling 
factors for each DB2 system. Information Table 820 iden- 
tifies the DB2 systems 822 and their scaling factors 824. For 
example, DB2 System A has a scaling factor of 1.428 (i.e., 
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200 MIPS/140 MIPS)), which reflects the higher processing 
power of DB2 System A relative to DB2 System B with a 
scaling factor of 0.714 and DB2 System C with a scaling 
factor of 0.714. 

5 Information Table 830 identifies the DB2 systems 832, 
along with the determination of the amount of work for an 
average processor 834. The apportioning system 124 deter- 
mines the number of tasks for an average processor when 
work is I/O bound differently than when work is CPU 

10 bound. However, the bound value is still based on the 
number of processors regardless of whether the work is CPU 
bound or I/O bound. 

In particular, in a homogenous SMP environment in 
which the workload is I/O bound, the apportioning system 

15 124 determines the amount of work for an average processor 
based on the number of processors and a bound value that 
reflects that the system is I/O bound. The apportioning 
system 124 determines the bound value by dividing the 
number of partitions (i.e., 100) by the number of processors. 

20 Assuming that there are to be 5 tasks per processor, since 
there are 20 processors, the number of tasks used for the 
determination is 100. Thus, the apportioning system 124 
determines that there is to be 1 partition/task and the bound 
value is 5 partitions/processor. The average amount of work 

25 for processors in a DB2 system 834 is the number of 
processors of a system times the bound value. For example, 
DB2 System A has 8 processors and the bound value is 5, so 
40 is an indicator of the amount of work of average 
processors in the DB2 System A. 

30 Once the amount of work for an average processor is 
determined, the apportioning system 124 determines the 
amount of work to be distributed to each DB2 system 836. 
The amount to be distributed to each DB2 system is the 
scaling factor for that system times the amount of work for 

35 an average processor. For example, for DB2 System A, the 
amount of work to be distributed to processors in DB2 
System A is equal to the scaling factor of 1.428 times the 
amount of work average processors in the system could 
perform, which is 40. Additionally, each processor in the 

40 DB2 system gets an amount of work equal to the amount of 
work distributed to the DB2 system divided by the number 
of processors for that DB2 system. For example, for DB2 
System A, each processor gets 7.14 of the work (i.e., 
57.13/8). The desired tasks 838 are the number of tasks to be 

45 distributed to each DB2 system. The number of desired tasks 
for a DB2 system is equal to (the degree of parallelism) 
times (the number of processors on the DB2 system divided 
by the total number of processors). 

Information Table 830 of the example shows that the DB2 

50 system with the faster processors gets more work than the 
DB2 systems with slower processors. This allows the work 
to be divided up so that each processor will finish in the same 
amount of time and prevents the slower processors from 
becoming a bottleneck. 

55 Conclusion 

This concludes the description of the preferred embodi- 
ment of the invention. The following describes some alter- 
native embodiments for accomplishing the present inven- 

60 tion. For example, any type of computer system, such as a 
mainframe, minicomputer, or personal computer, or com- 
puter configuration, such as a timesharing mainframe, local 
area network, or standalone personal computer, could be 
used with the present invention. 

65 In summary, the present invention discloses a method, 
apparatus, and article of manufacture for a computer- 
implemented apportioning system. The present invention 
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provides an improved technique for distributing work across 
processors. Additionally, the present invention provides a 
technique for distributing work across computer systems 
having one or more processors and connected by a network. 
Moreover, the present invention provides a technique for 5 
distributing work across processors so that each of the 
processors completes processing at approximately the same 
time. 

The foregoing description of the preferred embodiment of 
the invention has been presented for the purposes of illus- 10 
tration and description. It is not intended to be exhaustive or 
to limit the invention to the precise form disclosed. Many 
modifications and variations are possible in light of the 
above teaching. It is intended that the scope of the invention 
be limited not by this detailed description, but rather by the 
claims appended hereto. 15 

What is claimed is: 

1. A method of distributing work to processors in a 
multi-processor system, the method comprising: 

during bind-time, 

determining an average processing power for each 20 
processor, said average processing power being the 
sum of the processing powers of the processors 
divided by the number of processors; 

thereafter determining a scaling factor for each 
processor, wherein the scaling factor represents pro- 25 
cessing power of a processor divided by the average 
processing power; and 

distributing portions of a total amount of work to each 
processor based on the determined scaling factor of 
that processor and a determined amount of work for 30 
an average processor. 

2. The method of claim 1, wherein distributing portions of 
a total amount of work to each processor based on the 
determined scaling factor of that processor further com- 
prises: 35 

determining an amount of work that can be handled by an 

average processor; and 
distributing portions of a total amount of work to each 

processor based on the determined scaling factor of that 

processor and a determined amount of work for an 4Q 

average processor. 

3. The method of claim 2, wherein determining an amount 
of work that can be handled by an average processor further 
comprises: 

determining whether the work is I/O bound or CPU 45 
bound; and 

determining the amount of work for an average processor 
based on a selected number of processors and a value 
based on the determination as to how the work is 
bound. 50 

4. The method of claim 1, further comprising, during 
run-time, re-determining the portions of work to be distrib- 
uted to each processor based on a change in resources. 

5. The method of claim 4, wherein the change in resources 

is a change in the number of processors. 55 

6. The method of claim 4, wherein the change in resources 
is a change in an amount of storage to be used by the 
processors. 

7. The method of claim 1, wherein the multiple processors 
are on different systems connected by a network. 60 

8. The method of claim 1, wherein the multiple processors 
are part of one system. 

9. An apparatus for distributing work to processors in a 
multi-processor system, comprising: 

a computer having a data storage device connected 65 
thereto, wherein the data storage device stores a data- 
base containing the partitioned data; 
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one or more computer programs, performed by the 

computer, for, during binding: 

determining an average processing power for each 
processor, said average processing power being the 
sum of the processing powers of the processors 
divided by the number of processors; 

thereafter determining a scaling factor for each 
processor, wherein the scaling factor represents pro- 
cessing power of a processor divided by the average 
processing power; and 

distributing portions of a total amount of work to each 
processor based on the determined scaling factor of 
that processor and a determined amount of work for 
an average processor. 

10. The apparatus of claim 9, wherein the scaling factor 
is determined by the processing power of a processor 
divided by the average processing power. 

11. The apparatus of claim 9, wherein the means for 
distributing portions of a total amount of work to each 
processor based on the determined scaling factor of that 
processor further comprises: 

means for determining an amount of work that can be 

handled by an average processor; and 
means for distributing portions of a total amount of work 

to each processor based on the determined scaling 

factor of that processor and a determined amount of 

work for an average processor. 

12. The apparatus of claim 11, wherein the means for 
determining an amount of work that can be handled by an 
average processor further comprises: 

means for determining whether the work is I/O bound or 

CPU bound; and 
means for determining the amount of work for an average 

processor based on a selected number of processors and 

a value based on the determination as to how the work 

is bound. 

13. The apparatus of claim 9, further comprising, during 
run-time, means for re-determining the portions of work to 
be distributed to each processor based on a change in 
resources. 

14. The apparatus of claim 13, wherein the change in 
resources is a change in the number of processors. 

15. The apparatus of claim 13, wherein the change in 
resources is a change in an amount of storage to be used by 
the processors. 

16. The apparatus of claim 9, wherein the multiple pro- 
cessors are on different systems connected by a network. 

17. The apparatus of claim 9, wherein the multiple pro- 
cessors are part of one system. 

18. The apparatus of claim 9, wherein the one or more 
computer programs performed by the computer that distrib- 
ute portions of a total amount of work to each processor 
based on the determined scaling factor of that processor 
further comprise: 

one or more computer programs that determine an amount 
of work that can be handled by an average processor; 
and 

one or more computer programs that distribute portions of 
a total amount of work to each processor based on the 
determined scaling factor of that processor and a deter- 
mined amount of work for an average processor. 

19. The apparatus of claim 18, wherein the one or more 
computer programs performed by the computer that deter- 
mine an amount of work that can be handled by an average 
processor further comprise: 

one or more computer programs performed by the com- 
puter that determine whether the work is I/O bound or 
CPU bound; and 
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one or more computer programs performed by the com- 
puter that determine the amount of work for an average 
processor based on a selected number of processors and 
a value based on the determination as to how the work 
is bound. 5 

20. The apparatus of claim 9, further comprising one or 
more computer programs performed by the computer during 
run-time that re-determine the portions of work to be dis- 
tributed to each processor based on a change in resources. 

21. An article of manufacture comprising a program 1Q 
carrier readable by a computer and embodying one or more 
instructions executable by a computer to perform a method 
that distributes work to processors in a multi-processor 
system, in a database stored data storage device connected 

to the computer, the method comprising: 
during bind-time: 15 

determining an average processing power for each 
processor, said average processing power being the 
sum of the processing powers of the processors 
divided by the number of processors; 

thereafter determining a scaling factor for each 20 
processor, wherein the scaling factor represents pro- 
cessing power of a processor divided by the average 
processing power; and 

distributing portions of a total amount of work to each 
processor based on the determined scaling factor of 2 5 
that processor and a determined amount of work for 
an average processor. 

22. The article of manufacture of claim 21, wherein 
distributing portions of a total amount of work to each 
processor based on the determined scaling factor of that 3Q 
processor further comprises: 

determining an amount of work that can be handled by an 

average processor; and 
distributing portions of a total amount of work to each 

processor based on the determined scaling factor of that 35 

processor and a determined amount of work for an 

average processor. 

23. The article of manufacture of claim 22, wherein 
determining an amount of work that can be handled by an 
average processor further comprises: ^ 

determining whether the work is I/O bound or CPU 
bound; and 

determining the amount of work for an average processor 
based on a selected number of processors and a value 
based on the determination as to how the work is 45 
bound. 

24. The article of manufacture of claim 21, further 
comprising, during run-time, re-determining the portions of 
work to be distributed to each processor based on a change 

in resources. 50 

25. The article of manufacture of claim 21, wherein the 
change in resources is a change in the number of processors. 

26. The article of manufacture of claim 21, wherein the 
change in resources is a change in an amount of storage to 
be used by the processors. 55 

27. The article of manufacture of claim 21, wherein the 
multiple processors are on different systems connected by a 
network. 

28. The article of manufacture of claim 21, wherein the 
multiple processors are part of one system. 60 

29. A method of distributing work to processors in a 
multi-processor system, the method comprising: 

during bind-time, 
determining a scaling factor for each processor, 
wherein the scaling factor represents relative pro- 65 
cessing power in relation to each other processor; 
and 



distributing portions of a total amount of work to each 
processor based on the determined scaling factor of 
that processor and a determined amount of work for 
an average processor. 

30. The method of claim 29, wherein the method further 
comprises determining an average processing power for 
each processor before determining a scaling factor. 

31. The method of claim 30, wherein each processor has 
an associated processing power and the average processing 
power is the sum of the processing powers of the processors 
divided by the number of processors. 

32. The method of claim 31, wherein the scaling factor is 
determined by the processing power of a processor divided 
by the average processing power. 

33. An apparatus for distributing work to processors in a 
multi-processor system, comprising: 

a computer having a data storage device connected 
thereto, wherein the data storage device stores a data- 
base containing the partitioned data; 

one or more computer programs, performed by the 
computer, for, during bind-time, determining a scaling 
factor for each processor, wherein the scaling factor 
represents relative processing power in relation to each 
other processor, and distributing portions of a total 
amount of work to each processor based on the deter- 
mined scaling factor of that processor and a determined 
amount of work for an average processor. 

34. The apparatus of claim 33, further comprising means 
for determining an average processing power for each 
processor before determining a scaling factor. 

35. The apparatus of claim 34, wherein each processor has 
an associated processing power and the average processing 
power is the sum of the processing powers of the processors 
divided by the number of processors. 

36. The apparatus of claim 33, wherein the one or more 
computer programs performed by the computer determine 
an average processing power for each processor before 
determining a scaling factor. 

37. An article of manufacture comprising a computer 
program carrier readable by a computer and embodying one 
or more instructions executable by the computer to perform 
a method that distributes work to processors in a multi- 
processor system, in a database stored in a data storage 
device connected to the computer, the method comprising: 

during bind-time, 

determining a scaling factor for each processor, 
wherein the scaling factor represents relative pro- 
cessing power in relation to each other processor; 
and 

distributing portions of a total amount of work to each 
processor based on the determined scaling factor of 
that processor and a determined amount of work for 
an average processor. 

38. The article of manufacture of claim 37, wherein the 
method further comprises determining an average process- 
ing power for each processor before determining a scaling 
factor. 

39. The article of manufacture of claim 38, wherein each 
processor has an associated processing power and the aver- 
age processing power is the sum of the processing powers of 
the processors divided by the number of processors. 

40. The article of manufacture of claim 39, wherein the 
scaling factor is determined by the processing power of a 
processor divided by the average processing power. 

41. A method of distributing work during bind-time to 
processors in a multi-processor system, the method com- 
prising: 
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determining a scaling factor for each processor, wherein 
the scaling factor represents relative processing power 
in relation to each other processor; 

distributing portions of the total amount of work based on 
the determined scaling factor of that processor and a 5 
determined amount of work for an average processor, 
wherein the processors having scaling factors that are 
magnitudes of order larger than the other processors 
receive all the distributed portions of work. 

42 . An apparatus that distributes work during bind-time to 10 
processors in a multi-processor system, the apparatus com- 
prising: 

a computer having a data storage device connected 
thereto, wherein the data storage device stores a data- 
base containing partitioned data; 35 

one or more computer programs, performed by the 
computer, for, during bind-time, determining a scaling 
factor for each processor, wherein the scaling factor 
represents relative processing power in relation to each 2Q 
other processor, and distributing portions of the total 
amount of work based on the determined scaling factor 
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of that processor and a determined amount of work for 
an average processor, wherein the processors having 
scaling factors that are magnitudes of order larger than 
the other processors receive all the distributed portions 
of work. 

43. An article of manufacture comprising a computer 
program carrier readable by a computer and embodying one 
or more instructions executable by the computer to perform 
a method that distributes work during bind-time to proces- 
sors in a multi-processor system, the method comprises: 
determining a scaling factor for each processor, wherein 
the scaling factor represents relative processing power 
in relation to each other processor; 
distributing portions of the total amount of work based on 
the determined scaling factor of that processor and a 
determined amount of work for an average processor, 
wherein the processors having scaling factors that are 
magnitudes of order larger than the other processors 
receive all the distributed portions of work. 

* * * * * 
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